136 research outputs found

    Analysis of 1276 Haplotype-Resolved Genomes Allows Characterization of Cis- and Trans-Abundant Genes

    Get PDF
    Many methods for haplotyping have materialized, but their application on a significant scale has been rare to date. Here we summarize analyses that were carried out in 1092 genomes from the 1000 Genomes Consortium and validated in an unprecedented number of 184 PGP genomes that have been experimentally haplotype-resolved by application of the Long-Fragment Read (LFR) technology. These analyses provided first insights into the diplotypic nature of human genomes and its potential functional implications. Thus, protein-changing variants were not randomly distributed between the two homologues of 18,121 autosomal protein-coding genes but occurred significantly more frequently in cis than in trans configurations in virtually each of the 1276 phased genomes. This resulted in global cis/trans ratios of ~60:40, establishing “cis abundance” as a universal characteristic of diploid human genomes. This phenomenon was based on two different classes of genes, a larger one exhibiting cis configurations of protein-changing variants in excess, so-called “cis-abundant” genes, and a smaller one of “trans-abundant” genes. These two gene classes, which together constitute a common diplotypic exome, were further functionally distinguished by means of gene ontology (GO) and pathway enrichment analysis. Moreover, they were distinguishable in terms of their effects on the human interactome, where they constitute distinct cis and trans modules, as shown with network propagation on a large integrated protein–protein interaction network. These analyses, recently performed with updated database and analysis tools, further consolidated the characterization of cis- and trans-abundant genes while expanding previous results. In this chapter, we present the key results along with the materials and methods to motivate readers to investigate these findings independently and gain further insights into the diplotypic nature of genes and genomes

    Hum Hered

    Get PDF
    The inference of haplotype pairs directly from unphased genotype data is a key step in the analysis of genetic variation in relation to disease and pharmacogenetically relevant traits. Most popular methods such as Phase and PL do require either the coalescence assumption or the assumption of linkage between the single-nucleotide polymorphisms (SNPs). We have now developed novel approaches that are independent of these assumptions. First, we introduce a new optimization criterion in combination with a block-wise evolutionary Monte Carlo algorithm. Based on this criterion, the 'haplotype likelihood', we develop two kinds of estimators, the maximum haplotype-likelihood (MHL) estimator and its empirical Bayesian (EB) version. Using both real and simulated data sets, we demonstrate that our proposed estimators allow substantial improvements over both the expectation-maximization (EM) algorithm and Clark's procedure in terms of capacity/scalability and error rate. Thus, hundreds and more ambiguous loci and potentially very large sample sizes can be processed. Moreover, applying our proposed EB estimator can result in significant reductions of error rate in the case of unlinked or only weakly linked SNPs

    Significant abundance of cis configurations of coding variants in diploid human genomes

    Get PDF
    To fully understand human genetic variation and its functional consequences, the specific distribution of variants between the two chromosomal homologues of genes must be known. The 'phase' of variants can significantly impact gene function and phenotype. To assess patterns of phase at large scale, we have analyzed 18 121 autosomal genes in 1092 statistically phased genomes from the 1000 Genomes Project and 184 experimentally phased genomes from the Personal Genome Project. Here we show that genes with cis-configurations of coding variants are more frequent than genes with trans-configurations in a genome, with global cis/trans ratios of ∌60:40. Significant cis-abundance was observed in virtually all genomes in all populations. Moreover, we identified a large group of genes exhibiting cis-configurations of protein-changing variants in excess, so-called 'cis-abundant genes', and a smaller group of 'trans-abundant genes'. These two gene categories were functionally distinguishable, and exhibited strikingly different distributional patterns of protein-changing variants. Underlying these phenomena was a shared set of phase-sensitive genes of importance for adaptation and evolution. This work establishes common patterns of phase as key characteristics of diploid human exomes and provides evidence for their functional significance, highlighting the importance of phase for the interpretation of protein-coding genetic variation and gene function

    Multiple haplotype-resolved genomes reveal population patterns of gene and protein diplotypes

    No full text
    To fully understand human biology and link genotype to phenotype, the phase of DNA variants must be known. Here we present a comprehensive analysis of haplotype-resolved genomes to assess the nature and variation of haplotypes and their pairs, diplotypes, in European population samples. We use a set of 14 haplotype-resolved genomes generated by fosmid clone-based sequencing, complemented and expanded by up to 372 statistically resolved genomes from the 1000 Genomes Project. We find immense diversity of both haploid and diploid gene forms, up to 4.1 and 3.9 million corresponding to 249 and 235 per gene on average. Less than 15% of autosomal genes have a predominant form. We describe a ‘common diplotypic proteome’, a set of 4,269 genes encoding two different proteins in over 30% of genomes. We show moreover an abundance of cis configurations of mutations in the 386 genomes with an average cis/trans ratio of 60:40, and distinguishable classes of cis- versus trans-abundant genes. This work identifies key features characterizing the diplotypic nature of human genomes and provides a conceptual and analytical framework, rich resources and novel hypotheses on the functional importance of diploidy

    Multiple haplotype-resolved genomes reveal population patterns of gene and protein diplotypes

    Get PDF
    To fully understand human biology and link genotype to phenotype, the phase of DNA variants must be known. Here we present a comprehensive analysis of haplotype-resolved genomes to assess the nature and variation of haplotypes and their pairs, diplotypes, in European population samples. We use a set of 14 haplotype-resolved genomes generated by fosmid clone-based sequencing, complemented and expanded by up to 372 statistically resolved genomes from the 1000 Genomes Project. We find immense diversity of both haploid and diploid gene forms, up to 4.1 and 3.9 million corresponding to 249 and 235 per gene on average. Less than 15% of autosomal genes have a predominant form. We describe a ‘common diplotypic proteome’, a set of 4,269 genes encoding two different proteins in over 30% of genomes. We show moreover an abundance of cis configurations of mutations in the 386 genomes with an average cis/trans ratio of 60:40, and distinguishable classes of cis- versus trans-abundant genes. This work identifies key features characterizing the diplotypic nature of human genomes and provides a conceptual and analytical framework, rich resources and novel hypotheses on the functional importance of diploidy

    Association of alpha1a-adrenergic receptor polymorphism and blood pressure phenotypes in the Brazilian population

    Get PDF
    Background: The alpha1A-adrenergic receptor (alpha(1A)-AR) regulates the cardiac and peripheral vascular system through sympathetic activation. Due to its important role in the regulation of vascular tone and blood pressure, we aimed to investigate the association between the Arg347Cys polymorphism in the alpha(1A)-AR gene and blood pressure phenotypes, in a large sample of Brazilians from an urban population. Methods: A total of 1568 individuals were randomly selected from the general population of the Vitoria City metropolitan area. Genetic analysis of the Arg347Cys polymorphism was conducted by polymerase chain reaction/restriction fragment length polymorphism. We have compared cardiovascular risk variables and genotypes using ANOVA, and Chi-square test for univariate comparisons and logistic regression for multivariate comparisons. Results: Association analysis indicated a significant difference between genotype groups with respect to diastolic blood pressure (p = 0.04), but not systolic blood pressure (p = 0.12). In addition, presence of the Cys/Cys genotype was marginally associated with hypertension in our population (p = 0.06). Significant interaction effects were observed between the studied genetic variant, age and physical activity. Presence of the Cys/Cys genotype was associated with hypertension only in individuals with regular physical activity (odds ratio = 1.86; p = 0.03) or younger than 45 years (odds ratio = 1.27; p = 0.04). Conclusion: Physical activity and age may potentially play a role by disclosing the effects of the Cys allele on blood pressure. According to our data it is possible that the Arg347Cys polymorphism can be used as a biomarker to disease risk in a selected group of individuals.FAPESP (Fundacao de Amparo a Pesquisa do Estado de Sao Paulo)[2001/03454-5

    Model order selection for bio-molecular data clustering

    Get PDF
    Background: Cluster analysis has been widely applied for investigating structure in bio-molecular data. A drawback of most clustering algorithms is that they cannot automatically detect the ”natural ” number of clusters underlying the data, and in many cases we have no enough ”a priori ” biological knowledge to evaluate both the number of clusters as well as their validity. Recently several methods based on the concept of stability have been proposed to estimate the ”optimal ” number of clusters, but despite their successful application to the analysis of complex bio-molecular data, the assessment of the statistical significance of the discovered clustering solutions and the detection of multiple structures simultaneously present in high-dimensional bio-molecular data are still major problems. Results: We propose a stability method based on randomized maps that exploits the high-dimensionality and relatively low cardinality that characterize bio-molecular data, by selecting subsets of randomized linear combinations of the input variables, and by using stability indices based on the overall distribution of similarity measures between multiple pairs of clusterings performed on the randomly projected data. A χ 2-based statistical test is proposed to assess the significance of the clustering solutions and to detect significant and if possible multi-level structures simultaneously present in the data (e.g. hierarchical structures)

    Common Genetic Variants of the Human Steroid 21-Hydroxylase Gene (CYP21A2) Are Related to Differences in Circulating Hormone Levels

    Get PDF
    This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Hungarian Scientific Research Fund (OTKA, PD100648 (AP)) Technology Innovation Fund, National Developmental Agency (KTIA-AIK-2012-12-1-0010). AP is the recipient of a “LendĂŒlet” grant from the Hungarian Academy of Sciences
    • 

    corecore